首页> 外文OA文献 >Probabilistic base calling of Solexa sequencing data.
【2h】

Probabilistic base calling of Solexa sequencing data.

机译:Solexa测序数据的概率基础调用。

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BACKGROUND: Solexa/Illumina short-read ultra-high throughput DNA sequencing technology produces millions of short tags (up to 36 bases) by parallel sequencing-by-synthesis of DNA colonies. The processing and statistical analysis of such high-throughput data poses new challenges; currently a fair proportion of the tags are routinely discarded due to an inability to match them to a reference sequence, thereby reducing the effective throughput of the technology.RESULTS: We propose a novel base calling algorithm using model-based clustering and probability theory to identify ambiguous bases and code them with IUPAC symbols. We also select optimal sub-tags using a score based on information content to remove uncertain bases towards the ends of the reads.CONCLUSION: We show that the method improves genome coverage and number of usable tags as compared with Solexa's data processing pipeline by an average of 15%. An R package is provided which allows fast and accurate base calling of Solexa's fluorescence intensity files and the production of informative diagnostic plots.
机译:背景:Solexa / Illumina短读超高通量DNA测序技术通过DNA菌落的合成并行测序产生数百万个短标签(最多36个碱基)。这种高通量数据的处理和统计分析提出了新的挑战;目前,由于无法将标签与参考序列进行匹配,因此通常会丢弃一部分标签,从而降低了该技术的有效吞吐量。结果:我们提出了一种基于模型的聚类和概率论来识别的新型碱基调用算法模棱两可的碱基,并用IUPAC符号对其进行编码。我们还使用基于信息内容的分数来选择最佳子标签,以消除靠近读物末端的不确定碱基。结论:我们证明,与Solexa的数据处理流程相比,该方法平均提高了基因组覆盖率和可用标签数量15%。提供了一个R软件包,可以快速,准确地对Solexa的荧光强度文件进行碱基检定,并提供有用的诊断图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号